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A Counter-Example to the Mismatched 
Decoding Converse for Binary-Input 
Discrete Memoryless Channels 

Jonathan Scarlett, Anelia Somekh-Baruch, Alfonso Martinez and Albert Guillen i Fabregas 


Abstract 

This paper studies the mismatched decoding problem for binary-input discrete memoryless channels. An example 
is provided for which an achievable rate based on superposition coding exceeds the LM rate (Hui, 1983; Csiszar- 
Korner, 1981), thus providing a counter-example to a previously reported converse result (Balakirsky, 1995). Both 
numerical evaluations and theoretical results are used in establishing this claim. 


I. Introduction 

In this paper, we consider the problem of channel coding with a given (possibly suboptimal) decoding rule, 
i.e. mismatched decoding ID-Ill- This problem is of significant interest in settings where the optimal decoder is 
ruled out due to channel uncertainty or implementation constraints, and also has several connections to theoretical 
problems such as zero-error capacity. Finding a single-letter expression for the channel capacity with mismatched 
decoding is a long-standing open problem, and is believed to be very difficult; the vast majority of the literature 
has focused on achievability results. The only reported single-letter converse result for general decoding metrics is 
that of Balakirsky Q, who considered binary-input discrete memoryless channels (DMCs) and stated a matching 
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converse to the achievable rate of Hui JT| and Csiszar-Korner f2j]. However, in the present paper, we provide a 
counter-example to this converse, i.e. a binary-input DMC for which this rate can be exceeded. 

We proceed by describing the problem setup. The encoder and decoder share a codebook C = {.rC) . ... x (M> } 
containing M codewords of length n. The encoder receives a message m equiprobable on the set {1,... M} and 
transmits x^ m \ The output sequence y is generated according to W n {y\x) = JIILi where W is a 

single-letter transition law from X to y. The alphabets are assumed to be finite, and hence the channel is a DMC. 
Given the output sequence y, an estimate of the message is formed as follows: 

rh = aTgma.xq n (x^\y) 1 (1) 

i 

where q"(x. y) = fj" =1 q(xi,yi) for some non-negative function q called the decoding metric. An error is said to 
have occurred if rh differs from m, and the error probability is denoted by 

p e — P[m ^ m]. (2) 

We assume that ties are broken as errors. A rate R is said to be achievable if, for all <5 > 0, there exists a sequence 
of codebooks with M > e n ( R ~ 5 '> codewords having vanishing error probability under the decoding rule in (|T|). The 
mismatched capacity of (W, q) is defined to be the supremum of all achievable rates, and is denoted by Cm- 
In this paper, we focus on binary-input DMCs, and we will be primarily interested in the achievable rates based 
on constant-composition codes due to Hui Jl] and Csiszar and Korner (2), an achievable rate based on superposition 
coding by the present authors 0-(8l, and a reported converse by Balakirsky Q. These are introduced in Sections 
iTBlandlTCl 

A. Notation 

The set of all probability mass functions (PMFs) on a given finite alphabet, say X, is denoted by 'P(X). and 
similarly for conditional distributions (e.g. V(y\X)). The marginals of a joint distribution Pxy(x,y) are denoted 
by Px{x) and Py(y). Similarly, Py\ x {y |z) denotes the conditional distribution induced by Pxy{x,y). We write 
Px = Px to denote element-wise equality between two probability distributions on the same alphabet. Expectation 
with respect to a distribution Px{x) is denoted by Ep[-]. Given a distribution (){x) and a conditional distribution 
W(y\x), the joint distribution Q(x)W(y\x) is denoted by Q x W. Information-theoretic quantities with respect to 
a given distribution (e.g. Pxy{x , y)) are written using a subscript (e.g. Ip(X: Y)). All logarithms have base e, and 
all rates are in nats/use. 

B. Achievability 

The most well-known achievable rate in the literature, and the one of the most interest in this paper, is the LM 
rate, which is given as follows for an arbitrary input distribution Q € V(X): 

Ilm(Q) = _ min _ Ip(X;Y), (3) 

PxY&V(Xxy) : Px=Q, Py=Py 
E p [log q(X,Y)]>Ep [log q(X,Y )] 
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where Pxy — Q x W. This rate was derived independently by Hui (T] and Csiszar-Korner J2]. The proof uses 
a standard random coding construction in which each codeword is independently drawn according to the uniform 
distribution on a given type class. The following alternative expression was given by Merhav et al. a using 
Lagrange duality: 

fLM(q) - .=5., g Q{x)wivlx) 106 s, qSmSU • <4) 


Since the input distribution Q is arbitrary, we can optimize it to obtain the achievable rate Clm — maxQ /lm(Q)- 
In general. Cm may be strictly higher than Clm Gl. 0. 

The first approach to obtaining achievable rates exceeding Clm was given in 0. The idea is to code over pairs 
of symbols: If a rate R is achievable for the channel ((t/i, 2 / 2 )|(tci, ^ 2 )) — W{yi\x\)W ( 3 / 2 1#2) with the metric 

22 ), (j/i, 2 / 2 )) — q(xi,yi)q(x 2 ,y 2 ), then -j is achievable for the original channel W with the metric q. 
Thus, one can apply the LM rate to (W^, q^), optimize the input distribution on the product alphabet, and infer 
an achievable rate for (W,q); we denote this rate by Cj^,. An example was given in 0 for which Cgi > Clm- 
Moreover, as stated in 0. the preceding arguments can be applied to the fc-th order product channel for k > 2; we 
denote the corresponding achievable rate by C^m- It was conjectured in 0 that lim / t_ s . 00 C^m = Cm- It should be 
noted that the computation of C TJ jj is generally prohibitively complex even for relatively small values of k, since 
/r,M (Q) is non-concave in general ifTOl . 

Another approach to improving on Clm is to use multi-user random coding ensembles exhibiting more structure 
than the standard ensemble containing independent codewords. This idea was first proposed by Lapidoth 0, who 
used parallel coding techniques to provide an example where Cm = C (with C being the matched capacity) but 
Clm < C. Building on these ideas, further achievable rates were provided by the present authors 0-10 using 
superposition coding techniques. Of particular interest in this paper is the following. For any finite auxiliary alphabet 
U and input distribution Qux, the rate R = Rq + Ri is achievable for any (Rq, li \) satisfying 


f?i < min _ Ip{X-Y\U) (5) 

PuxveViUxXxy ): Pux=Qux , Puy—Puy 
E p [\o g q(X,Y)]>E P [\o S q(,X,Y)] 

Ro<„ min _ Ip(U-,X)+[lp(X-,Y\U)-R 1 ] + , (6) 

PuXYeV(UxXxy):P UX =QuX,PY=PY 
E p [log q(X ,y )] >E P [log q(X ,y )] 

where Pjjxy — Qux x W. We define Isc(Qux) to be the maximum of Rq + Il\ subject to these constraints, 
and we write the optimized rate as Csc — sup M q ux IsciQux). We also note the following dual expressions for 


The condition in j6) has a slightly different form to that in (6j, which contains the additional constraint Ip(U',X) < Ro and replaces the 
[•]+ function in the objective by its argument. Both forms are given in 0, and their equivalence is proved therein. A simple way of seeing this 
equivalence is by noting that both expressions can be written as 0 < min ^ ^ ^ max { fp(U. X : V) — (f?o + Ri), Ip(U; X) — -Ro}. 
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©-© ©, ®: 


n ( T q .\s p a(u,x) 

* ~ .J?? ■ £ Qux{u,x)Wlj,\x)lo t =—— ir- 


s>0,a(-,-) 


Ex Qx\u(x\u)q{x, y y e < u ’ x ) 
(<?(x,y) s e a (“’ x )) Pl 


i?o < sup V] <5[/x(w,^)VF(y|a;)log- _ 

Pie[o,i], s >o,a(-,-) u a , jy Err Qu(u) Ex Qx|t/(z|u)g(S, y ) s e < u ' x ) J 


(7) 

pT — pi-Ri- (8) 


We outline the derivations of both the primal and dual expressions in Appendix [A] 

We note that Csc is at least as high as Lapidoth’s parallel coding rate ||6]|-|f8ll, though it is not known whether 
it can be strictly higher. In J6), a refined version of superposition coding was shown to yield a rate improving on 
IsciQux) for fixed (U. Qux), but the standard version will suffice for our purposes. 

The above-mentioned technique of passing to the k- th order product alphabet is equally valid for the superposition 

(k) (2) 

coding achievable rate, and we denote the resulting achievable rate by Cg C ; . The rate Cg C ' will be particularly 
important in this paper, and we will also use the analogous quantity I^iQux) with a fixed input distribution 
Qux- Since the input alphabet of the product channel is X 2 , one might more precisely write the input distribution 
as Qux 1 - 2), but we omit this additional superscript. The choice U = {0,1} for the auxiliary alphabet will prove to 
be sufficient for our purposes. 


C. Converse 

Very few converse results have been provided for the mismatched decoding problem. Csiszar and Narayan lf3ll 
showed that Hindoo = Cm for erasures-only metrics, i.e. metrics such that q(x, y ) = max li9 q(x, y ) for 
all ( x,y) such that W(y\x) > 0. More recently, multi-letter converse results were given by Somekh-Baruch ATI , 
yielding a general formula for the mismatched capacity in the sense of Verdu-Han ltl2l . However, these expressions 
are not computable. 

The only general single-letter converse result presented in the literature is that of Balakirsky 03, who reported 
that Clm = Cm for binary-input DMCs. In the following section, we provide a counter-example showing that in 
fact the strict inequality Cm > Clm can hold even in this case. 

II. The Counter-Example 

The main claim of this paper is the following; the details are given in Section Hill 

Counter-Example 1. Let X = {0, 1} and y = {0, 1,2}, and consider the channel and metric described by the 
entries of the \X\ x |y| matrices 


W = 


0.97 

0.03 

0 


1 1 

1 




, Q = 



0.1 

0.1 

0.8 


1 0.5 

1.36 


Then the LM rate satisfies 


0.136874 < Clm < 0.136900 nats/use, 


(9) 


( 10 ) 
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Q(0) 


Figure 1: Numerical evaluations of the LM rate /lm ((}) as a function of the (first entry of the) input distribution, 
and the corresponding superposition coding rate I^-l (Qux) using the construction described in Section II11-131 The 
matched capacity is C « 0.4944 nats/use, and is achieved by Q(0) ~ 0.5398. 


whereas the superposition coding rate obtained by considering the second-order product of the channel is lower 
bounded by 

CgQ > 0.137998 nats/use. (11) 

Consequently, we have Cm > Clm- 

We proceed by presenting various points of discussion. 

Numerical Evaluations: While (ITUl) and (fill are obtained using numerical computations, and the difference 
between the two is small, we will take care in ensuring that the gap is genuine, rather than being a matter of 
numerical accuracy. All of the code used in our computations is available online 131 - 
Figure [l] plots our numerical evaluations of /lm (Q) and (Qux) for a range of input distributions; for the 
latter, Qux is determined from Q in a manner to be described in Section IIII-DI Note that this plot is only meant 
to help the reader visualize the results; it is not sufficient to establish Counter-Example Q] in itself. Nevertheless, it 
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is reassuring to see that the curves corresponding to the primal and dual expressions are indistinguishable. 
Our computations suggest that 


C'lm ~ 0.136875 nats/use, 


( 12 ) 


and that the optimal input distribution is approximately 


Q 


0.75597 0.24403 


03) 


The matched capacity is significantly higher than C'lm, namely C ~ 0.4944 nats/use, with a corresponding input 
distribution approximately equal to [0.5398 0.4602]. As seen in the proof, the fact that the right-hand side of (ITOb 
exceeds that of (fl2l > by 2.5 x 10 -5 is due to the use of (possibly crude) bounds on the loss in the rate when Q is 
slightly suboptimal. 

Other Achievable Rates: One may question whether (TTTb can be improved by considering for k > 2. 
However, we were unable to find any such improvement when we tried k = 3; see Section I1II-DI for further 
discussion on this attempt. Similarly, we observed no improvement on (fl2l > when we computed J LM (Q (2) ) With 
a brute force search over Cf 2 ' 1 £ V{X 2 ) to two decimal places. Of course, it may still be that C^m > C'lm for 
some k > 2, but optimizing Q <kl quickly becomes computationally difficult; even for k = 3, the search space is 
7-dimensional with no apparent convexity structure. 

Our numerical findings also showed no improvement of the superposition coding rate Csc for the original channel 
(as opposed to the product channel) over the LM rate C'lm- 

We were also able to obtain the achievable rate in ( ITOt using Lapidoth’s expurgated parallel coding rate 151 (or 
more precisely, its dual formulation from |6)) to the second-order product channel. In fact, this was done by taking 
the input distribution Qux and the dual parameters (s, a, p \) used in ([T}-® (see Section ITlI-Db . and “transforming” 
them into parameters for the expurgated parallel coding ensemble that achieve an identical rate. Details are given 

in Appendix O 

Choices of Channel and Metric: While the decoding metric in I® may appear to be unusual, it should be 
noted that any decoding metric with max^y q{x, y) > 0 is equivalent to another metric yielding a matrix of this 
form with the first row and first column equal to one Q, fl3l . 

One may question whether the LM rate can be improved for binary-input binary-output channels, as opposed to 
our ternary-output example. However, this is not possible, since for any such channel the LM rate is either equal 
to zero or the matched capacity, and in either case it coincides with the mismatched capacity |®. 

Unfortunately, despite considerable effort, we have been unable to understand the analysis given in m in 
sufficient detail to identify any major errors therein. We also remark that for the vast majority of the examples we 
considered, C'lm was indeed greater than or equal to all other achievable rates that we computed. However, ® was 
not the only counter-example, and others were found with min Xi y W(y\x) > 0 (in contrast with ®). For example, 
a similar gap between the rates was observed when the first row of W in ® was replaced by [0.97 0.02 0.01]. 
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III. Establishing Counter-ExampleQ] 

While Counter-Example Q] is concerned with the specific channel and metric given in @. we will present several 
results for more general channels with X = {0,1} and y = {0,1, 2} (and in some cases, arbitrary finite alphabets). 
To make some of the expressions more compact, we define Q x = Q(x), W xy = W(y\x) and q xy = q(x,y) 
throughout this section. 


A. Auxiliary Lemmas 

The optimization of /lm (Q) over Q can be difficult, since I\,\\(Q) is non-concave in Q in general ifTOl . Since 
we are considering the case \X\ = 2, this optimization is one-dimensional, and we thus resort to a straightforward 
brute-force search of Q o over a set of regularly-spaced points in [0,1]. To establish the upper bound in ( 1 1 Oi l, we 
must bound the difference Clm — Ilm (Qo) for the choice of Q o maximizing the LM rate among all such points. 
Lemma[2]below is used for precisely this purpose; before stating it, we present a preliminary result on the continuity 
of the binary entropy function ^(ct) — —cr log a — (1 — a) log(l — a). 

It is well-known that for two distributions Q and Q' on a common finite alphabet, we have II((}') — II(Q)\ < 
<5log ^ whenever ||Q' — QII l < <5 lfl5l Lemma 2.7]. The following lemma gives a refinement of this statement for 
the case that \X\ = 2 and min { Q' 0 , Q\ } is no smaller than a predetermined constant. 


Lemma 1. Let Q' G V(X) be a PMF on X = {0,1} such that min{Qg, Q[} > Q' min for some > 0. For 

any PMF Q G V{X) such that \Qq — Q' 0 \ < S (or equivalently, |Qi — Q'i\ < 5), we have 

\H(Q')-H(Q)\<6 log 1 "^ . (14) 

'■B min 


Proof: Set A = Qo — Q( y Since ILii:) is concave, the straight line tangent to a given point always lies above 
the function itself. Assuming without loss of generality that Qg < 0.5, we have 

dH 2 


\H2{Q o + A) — H 2 (Q' 0 )\ < |A| 


da 


= |A| log 


a=Qg 

1 - Qo 

Q'o ' 


(15) 

(16) 


The desired result follows since 


t Q o 


is decreasing in Q' 0 , and since Q' 0 > Q] nin and |A| < <5 by assumption. 


-7- J.O UWIUUiMirg J.11 CHIU OillLU 

0 

The following lemma builds on the preceding lemma, and is key to establishing Counter-Example [Q 


Lemma 2. For any binary-input mismatched DMC, we have the following under the setup of Lemma [7} 

/lm(Q) > Jlm(Q') - <5 log 1 ~ / Q ° in - S -^. 

^min min 


(17) 


Proof: The bound in (ITTI i is trivial when Ilm(Q / ) = 0, so we consider the case /lm(Q 0 > 0. Observing that 
Q(x) > 0 for x G {0,1}, we can make the change of variable a(x) = log (i.e. e a ^ = Q(x)e a ^) in 0 to 
obtain 


-Ilm(Q) 


sup ^2 Q(x)W(y\x) log 

s >°.“(■) x,y 


q(x,y) s e h W 

Q(x)J2w < l(x,y) s e ii ^y 


(18) 


August 11, 2015 


DRAFT 










which can equivalently be written as 


/lm(Q') = H{Q') - inf Y, Q'(x)W{y\x) log 1+ q ) ' U s ... , 
«>o,o(-)“ \ q(x, y) s e a ^ x > J 


q{x,y) s e^ 


(19) 


where x G {0,1} denotes the unique symbol differing from x £ {0,1}. 

The following arguments can be simplified when the infimum is achieved, but for completeness we consider the 
general case. Let (sk,dk) be a sequence of parameters such that 


H{Q')- lim V Q'{x)W(y\x) log (1 + 

b -fYl • ^ \ 




q(x, y) Sk e ak ^ x '> 
q(x, yY k e ak ^ x '> 


= Ilm(Q')- 


( 20 ) 


Since the argument to the logarithm in (l20t is no smaller than one, and since H(Q') < log 2 by the assumption 
that the input alphabet is binary, we have for x = 0,1 and sufficiently large k that 


_—^ f n( T 7 i\ Sk p^kis c )\ 

YQ\x)W(y\x) log + <l 0g 2, 


( 21 ) 


since otherwise the left-hand side of (l20l) would be non-positive, in contradiction with the fact that we are considering 
the case Ilm(Q') > 0. Using the assumption min {Q' 0l Q'i) > Qmin’ we can weaken ETT) to 

(x. i log 2 


v-^ rr7 , . / q(x,y) Sk e ak( - x >\ log2 

y (y\x) og ( + )J - q 7 — 


We now have the following: 


/lm(Q) > H(Q) - lim sup y Q(x)W(y\x) log (1 + q ’ \ , 

k — ^oo “ V q{x,y) Sk e ak ( x > 


( 22 ) 


(23) 


> 


-—* *—/ ci(t (^O \ l — (~) f 

H{Q,) - H y s y E E "W log ( x + 5 log -q * 


min 


= H(Q') - limsup y (Q(x) + Q\x) -Q'(x)) Y W (v\ x ) lo S f 1 


k—>oo 


T. 7 li Sk p ak ( x 


q{x,y) 


q(x, y) Sk e Qfc ( a: ) 

> H(Q’) - lim sup y Q'{x)Y W(y\x)log fl+ V \ 6 ~ {x) ) ~^ lo g 1 Q ? mi 

fc^oo y y V q(x,y) Sk e ak ^> J Q’ min 


— 5 log 


(24) 

1 - Qmin 
Qmin 


6 log 2 
Qmin 


= Ilm(Q') — <5 log 


1 - Qrn 


S log 2 


(25) 


(26) 


(27) 


Q' . Q' . ’ 

where (l23t follows by replacing the infimum in ( fl~9l) by the particular sequence of parameters (sk, dk) and taking 
the lim sup, id follows from Lemma H El follows by applying (l22l > for the x value where Q{x) > Q'(x) and 
lower bounding the logarithm by zero for the other x value, and (1271 ) follows from (l20l ). ■ 


B. Establishing the Upper Bound in 

As mentioned in the previous subsection, we optimize Q by performing a brute force search over a set of regularly 
spaced points, and then using LemmaElto bound the difference Clm — /lm(Q)- We let the input distribution therein 
be Q' = argmaxQ /lm(Q)- Note that this maximum is always achieved, since 4m is continuous and bounded f3). 
If there are multiple maximizers, we choose one arbitrarily among them. 
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To apply Lemma|2l we need a constant such that min{Qo, Q^} > We present a straightforward choice 

based on the lower bound on the left-hand side of ( ITOt (proved in Section ITlI-Cb . By choosing Q' nirl such that even 
the mutual information I{X\Y) is upper bounded by the left-hand side of ( 110b when min { Q' 0 , Q \} < Q' nin , we see 
from the simple identity /[,m (Q) < I(X: Y) |3] that Q cannot maximize /lm- For the example under consideration 
(see ©), the choice Q^in = 0.042 turns out to be sufficient, and in fact yields I(X;Y) < 0.135. This can be 
verified by computing I(X\Y) to be (approximately) 0.0917, 0.4919 and 0.1348 for Qq = 0.042, Qq = 0.5 and 
Q 0 = 1 — 0.042 respectively, and then using the concavity of I(X ; Y) in Q to handle Qq £ [0,0.042)U(1 —0.042,1], 

Let h = 10 -5 , and suppose that we evaluate /lm(Q) for each Qq in the set 

■A = {OminJ Q min + h, . . . , 1 - Q ' min — h, 1 ~ Qmin}- ( 2 ^) 

Since the optimal input distribution Q' corresponds to some Q' 0 £ [Q' iliri , 1 — Q' nin ], we conclude that there exists 
some Q 0 £ A such that \Q' 0 — Q 0 \ <i Substituting 5 = = 0.5 x 10 5 and Q„ lin = 0.042 into ( fT71 >. we conclude 

that 

max J lm (Q) > Clm - 0.982 x 10“ 4 . (29) 

Qo£A 

We now describe our techniques for evaluating /lm (Q) for a fixed choice of Q. This is straightforward in 
principle, since the corresponding optimization problem is convex whether we use the primal expression in ([3]) or 
the dual expression in (|4]). Nevertheless, since we need to test a large number of Qq values, we make an effort to 
find a reasonably efficient method. 

We avoid using the dual expression in (@]), since it is a maximization problem; thus, if the final optimization 
parameters obtained differ slightly from the true optimal parameters, they will only provide a lower bound on 
/lm ((}). In contrast, the result that we seek is an upper bound. We also avoid evaluating (0 directly, since the 
equality constraints in the optimization problem could, in principle, be sensitive to numerical precision errors. 

Of course, there are many ways to circumvent these problems and provide rigorous bounds on the suboptimality 
of optimization procedures, including a number of generic solvers. We instead take a different approach, and reduce 
the primal optimization in ( ITOl) to a scalar minimization problem by eliminating the constraints one-by-one. This 
minimization will contain no equality constraints, and thus minor variations in the optimal parameter will still 
produce a valid upper bound. 

We first note that the inequality constraint can be replaced by an equality whenever /lm(Q) > 0 12] Lemma 1], 
which is certainly the case for the present example. Moreover, since the X -marginal is constrained to equal Q, we 
can let the minimization be over V{y\X) instead of V{X x y), yielding 

«Q)= _ min_ I W (X-Y), (30) 

weV(y\x)-.PY=P\- 
E q X w P°S ?(A',F)]=Ep [log q(X,Y)] 

where Py{v) — "Yhx Q(x)W(y\x) (recall also that Pxy = Q x W). Let us fix a conditional distribution W satisfying 
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the specified constraints, and write W xy = W(y\x). The analogous matrix to XV in © can be written as follows: 

Woo Woi 1 — Woo — Woi 
W 10 Wn l-Ww-Wu 
Since Py = Py implies H(Py) = H(Py), we can write the objective in 


W = 


(31) 


W( X ; F ) = H ( p >-) - H Q*w( Y \ X ) (32) 

= H(Py) + Qo(Woo log Woo + Woi log Woi + (1 - W 00 - Wn) log(l - W 00 - Wi)) 

+ Qi(Wi 0 log Wio + Wi log Wn + (1 - Wio - Wn)log(l - W w - W u ))- (33) 

We now show that the equality constraints can be used to express each W xy in terms of Wio- Using Py(y) = Py(y) 
for y = 0,1, along with the constraint containing the decoding metric, we have 

QoWoo + QiW w =Py(0) (34) 

QoWi+QiWn =Py(l) (35) 

Qi(Wn log gn + (1 - Wio - Wn) log < 712 ) = E P [logg(X, F)], (36) 


where in (136} we used the fact that logq(cc, y) = 0 for four of the six ( x, y) pairs (see ©). Re-arranging (f34t — (f36l) . 
we obtain 


Woo = 
Wi = 
Wn = 


iV(0)-QiWi O 

Qo 

PyiX) — QiWii 
Qo 
1 


fE P [logq(X,Y)] 


and substituting 


log gn - loggn V 

into (l38l > yields 


Q l 


- (1 - Wio) log 912 , 


(37) 

(38) 

(39) 


Wn = ( Py(l) - t — (^E P [logg(X,F)] - Qi(l - Wio) log 912 ^ ] ■ (40) 

Qo y log qn — log qi 2 \ J) 

We have thus written each entry of ([33} in terms of Wio, and we are left with a one-dimensional optimization 
problem. However, we must still ensure that the constraints W xy £ [0,1] are satisfied for all (x. y). Since each 

W xy is an affine function of Wio, these constraints are each of the form W' x,y ' 1 < Wio < W^°°’ V \ and the overall 

optimization is given by 

min _/(Wio), (41) 

w<w 10 <w 


where /(•) denotes the right-hand side of ( 133} upon substituting ( 1371 ), (139} and (|40} . and the lower and upper limits 
are given by W = max x>y W_^ x ’ v ^ and W = min^^ W ^ x ' v \ Note that the minimization region is non-empty, since 
XV = XV is always feasible. In principle one could observe W = W = Wio, but in the present example we found 
that W < W for every choice of Qo that we used. 

The optimization problem in (ITT} does not appear to permit an explicit solution. However, we can efficiently 
compute the solution to high accuracy using standard one-dimensional optimization methods. Since the convexity 
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of any optimization problem is preserved by the elimination of equality constraints lfl6l Sec. 4.2.4], and since the 
optimization problem in (l30l > is convex for any given Q, we conclude that /(•) is a convex function. Its derivative 
is easily computed by noting that 


— (crz + f3) log(az + /3) = a + a log (az + /?) 
dz 


(42) 


for all a, f3 and z yielding a positive argument to the logarithm. We can thus perform a bisection search as follows, 
where /'(•) denotes the derivative of /, and e is a termination parameter: 

1) Set i = 0, W (0) = W and W (0) = W; 


2) Set W m id = |(W (i) + W (l) ); if /'(W mid ) > 0 then set W (i+1) = W (i) and W (l+1> = W mid ; otherwise set 
W (i+1) = W m m and W (i+1) = W W ; 


t 77( j+1 ) 


3) If |/ , (! / I / m id) < f then terminate; otherwise increment i and return to Step 2. 

As mentioned previously, we do not need to find the exact solution to (TO , since any value of Wio £ [W, W] 
yields a valid upper bound on Zlm(Q)- However, we must choose e sufficiently small so that the bound in (ITOl) is 
established. We found e = 10~ 6 to suffice. 

We implemented the preceding techniques in C (see m for the code) to upper bound Ilm(Q) for each Qo £ A; 
see Figure [I] As stated following Counter-Example Q] we found the highest value of /i.m {(}) to be the right-hand 
side of (IT2] >. corresponding to the input distribution in (fl3l >. We found the corresponding minimizing parameter in 
<ED to be roughly H" 10 = 0.4252347. 

Instead of directly adding 10 -4 to (IT2l) in accordance with ( l29l ). we obtain a refined estimate by “updating” our 
estimate of Specifically, using ( |29] > and observing the values in Figure [Q we can conclude that the optimal 

value of Qo lies in the range [0.7, 0.8] (we are being highly conservative here). Thus, setting Q' min = 0.2 and using 
the previously chosen value 5 = 0.5 x HP 5 , we obtain the following refinement of d29i l: 


max/ LM (Q) >C LM -2.43 x lO" 5 . (43) 

Qo 

Since our implementation in C is based on floating-point calculations, the final values may have precision errors. 
We therefore checked our numbers using Mathematica’s arbitrary-precision arithmetic framework fl7l . which allows 
one to work with exact expressions that can then be displayed to arbitrarily many decimal places. More precisely, 
we loaded the values of Wio into Mathematica and rounded them to 12 decimal places (this is allowed, since 
any value of Wio yields a valid upper bound). Using the exact values of all other quantities (e.g. Q and IT'), we 
performed an evaluation of /(Wio) in CED- an d compared it to the corresponding value of /lm(Q) produced by 
the C program. The maximum discrepancy across all of the values of Qo was less than 2.1 x 1CT 12 . Our final 
bound in (ITOt was obtained by adding 2.5 x 10” 5 (which is, of course, higher than 2.43 x 10 -5 + 2.1 x 10 -12 ) to 
the right-hand side of (flZb . 
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C. Establishing the Lower Bound in © 

For the lower bound, we can afford to be less careful than we were in establishing the upper bound; all we need 
is a suitable choice of Q and the parameters (s, a) in (0. We choose Q as in ( 1 1 3l i. along with the following: 


s = 9.031844 


(44) 


a = 


0.355033 -0.355033 


(45) 


In Appendix [B] we provide details on how these parameters were obtained, though the desired lower bound can 
readily be verified without knowing such details. 

Using these values, we evaluated the objective in 0 using Mathematica’s arbitrary-precision arithmetic framework 
©, thus eliminating the possibility of arithmetic precision errors. See lfl4l for the relevant C and Mathematica 
code. 


D. Establishing the Lower Bound in © 


We establish the lower bound in (flTt by setting U = {0,1} and forming a suitable choice of Qux > and then 
using the dual expressions in 0-® to lower bound I^(Qux)- 

1) Choice of Input Distribution: Let Q = [Qo Q\ ] be some input distribution on X, and define the corresponding 
product distribution on X 2 as 


q (2) = 


Qo QoQi QoQi Qi > 


(46) 


where the order of the inputs is (0,0), (0,1), (1,0), (1,1). Consider now the following choice of superposition 
coding parameters for the second-order product channel (fU( 2 \ g( 2 )): 


Qu = 
Qx\u=o = 
Q x\u —I = 


1 -Qi Qi 

l 


i -Qi 

0 0 0 1 


Qo QoQi 


QoQi 0 


(47) 

(48) 

(49) 


This choice yields an A"-marginal Qx precisely given by (146b . and it is motivated by the empirical observation 
from 0 that choices of Qux where Qx\u=i and Qx\u =2 have disjoint supports tend to provide good rates. We 
let the single-letter distribution Q = [Qo Q i ] be 


Q = 


0.749 0.251 


(50) 


which we chose based on a simple brute force search (see Figure®. Note that this choice is similar to that in (fl3l >. 
but not identical. 

One may question whether the choice of the supports of Qx\u=o and Qx\u=i i n < 148 b — ( 149b is optimal. For 
example, a similar construction might set Qu{ 0) = Qo + QoQu an d then replace (I48l >- (l49b by normalized versions 
of [Q q QoQi 0 0] and [0 0 QoQi Qi]- However, after performing a brute force search over the possible support 
patterns (there are no more than 2 4 , and many can be ruled out by symmetry considerations), we found the above 
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pattern to be the only one to give an improvement on 4m. at least for the choices of input distribution in ( fl3l ) and 
(l50l >. In fact, even after setting \U\ = 3, considering the third-order product channel (. (/ :i> ), and performing a 
similar brute force search over the support patterns (of which there are no more than 3 8 ), we were unable to obtain 
an improvement on ( II It . 

2) Choices of Optimization Parameters: We now specify the choices of the dual parameters in GF®- In 
Appendix [B] we give details of how these parameters were obtained. We claim that the choice 

(f?o, f?i) = (0.0356005,0.2403966) (51) 

is permitted; observe that summing these two values and dividing by two (since we are considering the product 
channel) yields (fill . These values can be verified by setting the parameters as follows: On the right-hand side of 
®, set 


s = 9.4261226 


a = 


0.4817048 -0.2408524 -0.2408524 0 
0 0 0 0 


and on the right-hand side of ®, set 


Pl = 0.7587516 
s = 9.3419338 


a = 


0.7186926 -0.0488036 -0.0488036 



(52) 

(53) 

(54) 

(55) 

(56) 


0 0 0 -0.6210855 

Once again, we evaluated ®-® using Mathematica’s arbitrary-precision arithmetic framework |]7] , thus ensuring 
the validity of (fTTl i. See M for the relevant C and Mathematica code. 


IV. Conclusion and Discussion 

We have used our numerical findings, along with an analysis of the gap to suboptimality for slightly suboptimal 
input distributions, to show that it is possible for Cm to exceed Clm even for binary-input mismatched DMCs. 
This is in contrast with the claim in lfl3l that Cm = Clm for such channels. 

An interesting direction for future research is to find a purely theoretical proof of Counter-Example [Q the non¬ 
concavity of Jlm(Q) observed in Figure [T] may play a role in such an investigation. Furthermore, it would be of 
significant interest to develop a better understanding of lH3l . including which parts may be incorrect, under what 
conditions the converse remains valid, and in the remaining cases, whether a valid converse lying in between the 
LM rate and matched capacity can be inferred. 


Appendix A 

Derivations of the Superposition Coding Rates 
Here we outline how the superposition coding rates in Gil-® are obtained using the techniques of |6j, (7J. The 
equivalence of the primal and dual formulations can also be proved using techniques presented in [6). 
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A. Preliminary Definitions and Results 

The parameters of the random-coding ensemble are a finite auxiliary alphabet U, an auxiliary codeword distribution 
Pu, and a conditional codeword distribution p x\u- An auxiliary codebook {U il) }^!l\ with M 0 codewords is 
generated, with each auxiliary codeword independently distributed according to Pjj . For each i = a 

codebook {X <, ’ J) with M\ codewords is generated, with each codeword conditionally independently distributed 
according to Px\u{'\U^). The message m at the input to the encoder is indexed as (mo, mi), and for any such 
pair, the corresponding transmitted codeword is Thus, the overall number of messages is M — M 1 M 2 , 

yielding a rate of R = R\ + R 2 . More compactly, we have 

(57) 

We assume without loss of generality that message (1,1) is transmitted, and we write U and X in place of U ( 1 ; 
and X n ' 11 respectively. We write X to denote an arbitrary codeword X n 3,i with j / 1. Moreover, we let U denote 
an arbitrary auxiliary codeword with i 1, let X' 31 ( j = 1, • • ■ , Mi) denote the corresponding codeword 
X^ l,3 \ and let X denote X 1 ' 3 ' 1 for an arbitrary choice of j. Thus, defining Y to be the channel output, we have 

(U, X , Y , X, U, X) ~ P u (u)P xlu (x\u)W n (y\x)P xlu (x\u)P u (u)P x]u (x\u). (58) 

The decoder estimates m = (mo . rh \) according to the decoding rule in (|TJi. We define the following error events: 

(Type 0) q n (X < '' l '' J \ Y ) > q n (X 1 Y) for some i 1, j; 

(Type 1) q v (X ( ' X P, Y) > q n (X,Y) for some j ^ 1. 




M 0 


M 0 

n 

i= 1 


Mi 

Pu(u^)l[P: 
3 =1 


XI 


The probabilities of these events are denoted by p e ^(n, Mq, Mi) and p e ,i(n, Mi) respectively. The overall random¬ 
coding error probability p e {n, Mo, Mi) clearly satisfies p e < p e p + p e ,i- 
We begin by deriving the following non-asymptotic bounds: 


Pe j0 (n, M 0 ,M 1 ) < E 

Pe,i(n, Mi) < E 


r r 

r 

\q n (X,Y) ^ 
[q n (X,Y) ~ 




11 

min < 1, (Mq — 1)E 

min < 1, MiP 

U 

} 

U,X,Y 

}J 


min 1, (Mi - 1)P 


q n (X,Y) 


> 1 


U,X,Y 


(59) 

(60) 


X n (X,Y) 

We will see that (|59| > recovers the rate conditions in (j6]( and ([8J, whereas ({60} recovers those in ({5} and (|7}. We 
focus on the type-0 event throughout the appendix, since the type-1 event is simpler, and is handled using standard 
techniques associated with the case of independent codewords. 

To derive ( {59} . we first note that 


Pe, 0 = 





q n (X {i ’ j \Y) 
q n (X, Y) 



(61) 


Writing the probability as an expectation given (U, X , Y) and applying the truncated union bound to the union 
over i, we obtain 


Pe ,0 < E 


min 


1, (Mo - 1)P 


u 


f q n {XM,Y) 
l q n (X,Y) 



U,X,Y 


(62) 
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Applying the same argument to the union over j, we obtain the desired upper bound. 

Before proceeding, we introduce some standard notation and terminology associated with the method of types 
(e.g. see 1151 Ch. 2]). For a given joint type, say Pux, the type class T n (Pux) is defined to be the set of all 
sequences in U n x X n with type Pux- Similarly, for a given joint type Pux and sequence u £ T n (Pu), the 
conditional type class T™(Pux) is defined to be the set of all sequences x such that (u, x) £ T n (Pijx)■ 

In the remainder of the appendix, we consider the constant-composition ensemble, described by 

Pi ' ( “ )= if4rii 1 {” eT " l0[,) } (63) 

Px\u{x\u) = )| 1 { a: g T u(Qx\u)Y (64) 

Here we have assumed that Qux is a joint type for notational convenience; more generally, we can approximate 
Qux by a joint type and the analysis is unchanged. 


B. Derivation of the Primal Expression 

We will derive © by showing that the error exponent of p Bj0 (i.e. luninf,,-^ — ^ logp e ,o) is lower bounded by 


Ero (Qux ,Ro,Ri)= min 

Puxy ' Pux—Qux 


mm 

Puxy ■ Pux=Qux , Py=Py 
E p [log q(X,Y)) >E P [log q(X,Y)] 


D{P UX y\\Qux xW)+ Ip{U\Y) + [lp(X;Y\U) — i?i] + — Rq 


(65) 


The objective is always positive when Puxy is bounded away from Qux x W. Hence, and by applying a standard 
continuity argument as in (2 ] Lemmal], we may substitute Puxy = Qux x W to obtain the desired rate condition 

in ©. 

We obtain (l65l > by analyzing ( [59b using the method of types. Except where stated otherwise, it should be 
understood that unions, summations, and minimizations over joint distributions (e.g. Puxy) are only over joint 
types, rather than being over all distributions. 

Let us first condition on U = u, X = x, Y = y and U = u being fixed sequences, and let Puxy and Puy 
respectively denote the joint types of (it, x. y) and (u. y). We can write the inner probability in (l59t as 


P 


u 


{(u,X,y) £T n (P UXY )} 


Puxy '■ Pux=Qux , Puy=Puy 
Ep[logg(X,y)]>E P [log g (X,r)] 


U = u 


( 66 ) 


Note that the constraint Pux = Qux arises since every (u, x) pair has joint type Qux by construction. Applying 
the union bound, the property of types P[(m, X,y) £ T u (Puxy) \ U = u\< e~ nI p^ X: ' t E) jjjjj Ch. 2], and the 
fact that the number of joint types is polynomial in n, we see that the negative normalized (by —) logarithm of 
(166b is lower bounded by 


min _ ^ Ip(X-Y\U) (67) 

Puxy • Pux=Qux, Puy=Puy 
E p [log q(X,Y)] >Ep [log q(X,Y)] 

plus an asymptotically vanishing term. 
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Next, we write the inner expectation in ( l59l ) (conditioned on U = u, X = x and Y = y) as 


^ E 

Pur ■ Pu=Qu,Py=Py 


0 U,y)eT n (P UY ) 


min 


i, 


MiP 


u 

PuXY ■ PuX=QuX, PuY—PuY 
E p [log q(X,Y)] >E P [log q(X,Y)\ 


{(u,X,y) eT n (P UXY )} 



( 68 ) 


where now u denotes an arbitrary sequence such that (u. y ) £ T n (Pu Y )- Combining ( 1671 ) with the property of 
types ¥[(U,y) £ T n (P UY )\ < e - nI P^ u ' Y ^ H2 Ch. 2], we see that the negative normalized logarithm of (f68l is 


lower bounded by 


^ min ^ Ip{U\Y) + _ min _ [/p(X; Y\U) - f?i] + (69) 

p UY : Pu=Qu,P y—Py Puxy ■■ Pux=Qux , Puy—Puy 

Ep[lo gg (A-,r)]>Ep[log g (X I F)] 

plus an asymptotically vanishing term. Combining the two minimizations into one via the constraint Pjj Y = Pp Y , 
we see that the right-hand side of ( l69l > coincides with that of ([6]) (note, however, that Puxy need not equal Qux x W 
at this stage). Finally, the derivation of ( |65 | i is concluded by handling the outer expectation in ( l59l > in the same way 
as the inner one, applying the property P[([7, X, Y) £ T n (Pux Y )\ < e~ nD ^ PuxY ^ ux xW ' > fll5l Ch. 2] (which 
follows since (U,X) is uniform on T n (Qux )), and expanding the minimization set from joint types to general 
joint distributions. 


C. Derivation of the Dual Expression 

Expanding ( l59l ) and applying Markov’s inequality and min{l, a} < a p (p £ [0,1]), we obtain 


Pe, o< p u( u ) p x\u( x \ u W n (y\x)( 


T,=s p x\u(x\u)q n (x, y) s ' Pl 

q n (x,y) s 


P 0 


(70) 


for any po £ [0,1], pi £ [0,1] and s > 0. Let a(u,x) be an arbitrary function on U x X, and let a n {u,x) = 
/Cr=i a ( u i) x i) be its additive n-letter extension. Since (U,X) and (U,X) have the same joint type (namely, 
Qux ) by construction, we have a n (u. x) = a n (u. x), and hence we can write (1711) as 

\s a n (u,x) \ Pl\ Po 


Pe, o< ^2 Pu(u)Px\u(x\u)W n (y\x)iM 0 ^2 / P u (u)(Mi 


u : x,y 


Yx P x\u(x\u)q n (x,y) s e a 
q n (x,y) s e an ( u ’ !C ') 


■ (71) 


Upper bounding each constant-composition distribution by a polynomial factor times the corresponding i.i.d. 
distribution (i.e. Pu{u) < (n + l)!"!- 1 nr=i Qu(ui) C2 Ch. 2]), we see the exponent of p e ,o is lower bounded 
by 

(72) 


max E 0 (Qux,Pq,Pi) - Po( p o + PiRi), 
p 0 e[o,i],Pi6[o,i] 


where 

■n ,-s(YxQx\u{x\u)q{x,y) s e a ^ \ pl Y° 

E 0 {Qux,po,Pi) = sup -log 2_Q UX {u,x)W{y\x)\ 2_Qu{u)\ - - - , , -1 

s>o,a(',') UjX Vs v Q{x,y) e ’ / / 

(73) 

We obtain ([Sj in the same way as Gallager’s single-user analysis lfl8l Sec. 5.6] by evaluating the partial derivative 
of the objective in (f73l) at po - 0 (see also lfl9ll for the corresponding approach to deriving the LM rate). 
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Appendix B 

Further Numerical Techniques Used 

In this section, we present further details of our numerical techniques for the sake of reproducibility. Except 
where stated otherwise, the implementations were done in C. Our code is available online liT4l . 

The algorithms here do not play a direct role in establishing Counter-Example [I] We thus resort to “ad-hoc” 
approaches with manually-tuned parameters. In particular, we make no claims regarding the convergence of these 
algorithms or their effectiveness in handling channels and decoding metrics differing from Counter-Example Q] 

A. Evaluating /lm(Q) via the Dual Expression and Gradient Descent 

Here we describe how we optimized the parameters in © for a fixed input distribution Q to produce the dual 
values plotted in Figure [Q Note that replacing the optimization by fixed values leads to a lower bound, whereas for 
the primal expression it led to an upper bound. Thus, since the two are very close in Figure [H we can be assured 
that the true value of Ilm(Q) has been characterized accurately, at least for the values of Q shown. While we focus 
on the binary-input setting here, the techniques can be applied to an arbitrary mismatched DMC. For brevity, we 
write a x = a(x). 

Let Q be given, and let I(v) be the corresponding objective in © as a function of v = [s ao a\] T . Moreover, let 
V/(u) denote the 3x1 corresponding gradient vector containing the partial derivatives of /(•). These are all easily 
evaluated in closed form by a direct differentiation. We used the following standard gradient descent algorithm, 
which depends on the initial values (s^°\ a^, a sequence of step sizes {tW}, and a termination parameter e: 

1) Set i = 0 and initialize = [s^ a g 0 ' af^] T ; 

2) Set = t>W - tWV/(wW); 

3) If ||V/(tu l+1 ))|| < e then terminate; otherwise, increment i and return to Step 2. 

We used the initial parameters (s®, a^°\ a[°^) = (1,0,0), a fixed step size fM = l, and a termination parameter 
e = 10 -6 . 

Note that we have ignored the constraint s > 0, but this has no effect on the maximization. This is seen by 
noting that s is a Lagrange multiplier corresponding to the constraint on the metric in ©, and the inequality therein 
can be replaced by an equality as long as /r.M (Q) > 0 (3] Lemma 1]. The equality constraint yields a Lagrange 
multiplier on R, rather than R + . 

B. Evaluating I^q(Qux) via the Dual Expression and Gradient Descent 

To obtain the dual curve for 1^ in Figure Q] we optimized the parameters in ©-© in a similar fashion to the 
previous subsection. In fact, the optimization of © was done exactly as above, with the same initial parameters (i.e. 
initializing s = 1 and a (it, xj 0 for all ( u , x)). By letting © hold with equality, the solution to this optimization 
gives a value for R\. 

Handling the optimization in ® was less straightforward. We were unable to verify the joint concavity of the 
objective in (pi, s, a), and we in fact found a naive application of gradient descent to be problematic due to overly 
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large changes in p\ on each step. Moreover, while it is safe to ignore the constraint s > 0 in the same way as 
the previous subsection, the same is not true of the constraint p\ G [0,1]. We proceed by presenting a modified 
algorithm that handles these issues. 

Similarly to the previous subsection, we let v be the vector of parameters, let Iij(v) denote the objective in ([Sj 
with Qux and Ii\ fixed (the latter chosen as the value given by the evaluation of CD), and let VIq(v) be the 
corresponding gradient vector. Moreover, we define 


*(pi) = 


0 pi < 0 

PI Pi G [0, 1] 
1 Pi > 1 . 


(74) 


Finally, we let v_ Pl denote the vector v with the entry corresponding to p\ removed, and similarly for other vectors 
(e.g. (V/ 0 (u))_ pl ). 

We applied the following variation of gradient descent, which depends on the initial parameters, the step sizes 
{fW}, and two parameters e and e': 

1) Set i = 0 and initialize v^°K 

2) Set v ( _!+ 1} = v\ - tW(v/(„W))_ pi . 

3) If ||(V/ 0 (vW))_ pi || < e ' then set p[ l+1) = -f (i) §|£| w=tj(i) ); otherwise set p[ l+1) = p^. 

4) Terminate if either of the following conditions hold: (i) ||VI(i/ I+1 ))|| < e; (ii) ||(VJo(f^ +1 - ) ))-p 1 1| < e and 
fp +l 1 G {0,1}. Otherwise, increment i and return to Step 2. 

In words, pi is only updated if the norm of the gradient corresponding to (s, a) is sufficiently small, and the 
algorithm may terminate when pi saturates to one of the two endpoints of [0,1] (rather than arriving at a local 
maximum). We initialized s and pi to 1, and each a(u , x) to zero. We again used a constant step size t ^ = 1, and 
we chose the parameters e = 10 -6 and e' = 10 -2 . 


( 2^ 

C. Evaluating Isc(Qux) via the Primal Expression 

Since we only computed the primal expression for J^l (Qux) with a relatively small number of input distributions 
(namely, those shown in Figure |T]), computational complexity was a minor issue, so we resorted to the general- 
purpose software CVX for MATLAB l20l . In the same way as the previous subsection, we solved the right-hand 
side of © to find /i' l, then substituted the resulting value into Q to find Rq. 


Appendix C 

Achievability of CD} via Expurgated Parallel Coding 

Here we outline how the achievable rate of 0.137998 nats/use in (fill can be obtained using Lapidoth’s expurgated 
parallel coding rate. We verified this value by evaluating the primal expressions in (9j using CVX ||20|, and also 
by evaluating the equivalent dual expressions in (51 by a suitable adaptation of the dual optimization parameters 
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for superposition coding given in Section IIII-DI We focus our attention on the latter, since it immediately provides 
a concrete lower bound even when the optimization parameters are slightly suboptimal. 

The parameters to Lapidoth’s rate are two finite alphabets X\ and X 2 , two corresponding input distributions Q\ 
and Q 2 , and a function 6(xi. x 2 ) mapping X\ and X 2 to the channel input alphabet. For any such parameters, the 
rate R = Ri + R 2 is achievable provided that (6), {8| 


Ri < sup E 

s>0,a(-,-) 

f ?2 < sup E 


log 


and at least one of the following 


g(<l>(X 1 ,X 2 ),Yye< x ^ 

[q(<j>(X 1 ,X 2 ),Yye°(Xi’ x *)\X2,Y]_ 

\ q q(<j}(Xi,X 2 ), y) s e °(-^ 1 ’ A ' 2 ) 

_ ° S E [q{(f>(X 1 ,X 2 ), Y) s e a ^ Xl ’ X2 ^ \X lt Y] 


(75) 

(76) 


R\ < sup E 

pae[0,l],s>0,o(-,-) 


i ?2 < sup E 

piG[0,l],s>0,a(-,-) 


log 


(g( 0 (X 1 ,X 2 ),lO 8 e o(Xl ’ JCa) )' 


log 


E[(E[ g (</>(Xi,X 2 ),F) s e a(A ' 1 ’- Y2) | A'i]) P2 | Y 
(q{4>{X l ,X2),Y) s e< x " x *)) pl 


E 


(E [q{<l>{X u X 2 ),Yye« x '> x *) | A' 2])" 1 | Y 


— P 2 R 2 


— Pi Ri, 


(77) 


(78) 


where (Ai, X 2 , Y, X 1: X 2 ) ~ Qi(xi)Q 2 (x 2 )W(y\(/>(xi, x 2 ))Qi(xi)Q 2 (x 2 ). 

Recall the input distribution Qux for superposition coding on the second-order product channel given in (147l )-( |49l ). 
Denoting the four inputs of the product channel as {(0, 0), (0,1), (1,0), (1,1)}, we set X\ = {(0,0), (0,1), (1,0)}, 


X 2 =U = {0,1}, and 


Qx 1 

Qx 2 

<t>(xi,x 2 ) 


1 [■ 

l~Ql . 

1 -Qi 



Qo Q0Q1 
Qi 

x 2 = 0 

X 2 = 1 . 


Q0Q1 


(79) 

(80) 

( 81 ) 


This induces a joint distribution Qx 1 x 2 x{xi, x 2 , x) = Qx 1 {xi)Qx 2 {x 2 )^{x = <p(xi,x 2 )}. The idea behind this 
choice is that the marginal distribution Qx 2 x coincides with our choice of Qux for SC. 

By the structure of our input distributions, there is in fact a one-to-one correspondence between (u, x) and 
(xi,x 2 ), thus allowing us to immediately use the dual parameters (s, a. p\) from SC for the expurgated parallel 
coding rate. More precisely, using the superscripts (-) sc and (-) ex to distinguish between the two ensembles, we set 


pex _ psc 

ilj — jfti 

pex _ psc 

ri 2 — -TLq 

S ex = S sc 

a ex (xi,x 2 ) = a sc (x 2 ,(t>{x i,x 2 )) 

ex sc 

Pi = pi¬ 


rn 

(83) 

(84) 

(85) 

( 86 ) 
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Using these identifications along with the choices of the superposition coding parameters in (f52]>—(f56b. we verified 
numerically that the right-hand side of (l75l > (respectively, ( 17 8 b ) coincides with that of £7]) (respectively, ([Sj ). Finally, 
to conclude that the expurgated parallel coding rate recovers CD- we numerically verified that the rate Ha resulting 
from G3 and (f78l> (which, from (|5H . is 0.0356005) also satisfies (f76l >. In fact, the inequality is strict, with the 
right-hand side of (l76l > being at least 0.088. 
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